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ABSTRACT 

Achieving maximum scientific results from tlie overwlielming volume of astro- 
nomical data to be acquired over the next few decades will demand novel, fully 
automatic methods of data analysis. Artificial intelligence approaches hold great 
promise in contributing to this goal. Here we apply neural network learning tech- 
nology to the specific domain of eclipsing binary (EB) stars, of which only some 
hundreds have been rigorously analyzed, but whose numbers will reach millions 
in a decade. Well-analyzed EBs are a prime source of astrophysical information 
whose growth rate is at present limited by the need for human interaction with 
each EB data-set, principally in determining a starting solution for subsequent 
rigorous analysis. We describe the artificial neural network (ANN) approach 
which is able to surmount this human bottleneck and permit EB-based astro- 
physical information to keep pace with future data rates. The ANN, following 
training on a sample of 33,235 model light curves, outputs a set of approximate 
model parameters (T2/T1, (_Ri -|- R2)/a, esina;, ecosu, and sini) for each in- 
put light curve data-set. The whole sample is processed in just a few seconds 
on a single 2GHz CPU. The obtained parameters can then be readily passed to 
sophisticated modeling engines. We also describe a novel method polyfit for pre- 
processing observational light curves before inputting their data to the ANN and 
present the results and analysis of testing the approach on synthetic data and on 
real data including fifty binaries from the Catalog and Atlas of Eclipsing Binaries 
(CALEB) database and 2580 light curves from OGLE survey data. The success 
rate, defined by less than a 10% error in the network output parameter values, 
is approximately 90% for the OGLE sample and close to 100% for the CALEB 
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sample - sufficient for a reliable statistical analysis. The code is made available to 
the public. Our approach is applicable to EB light curves of all classes; this first 
paper in the Eclipsing Binaries via Artificial Intelligence (EBAI) series focuses on 
detached EBs, which is the class most challenging for this approach. 

Subject headings: methods: data analysis — methods: numerical — binaries: 
eclipsing — stars: fundamental parameters 



1. Introduction 



Over the past decade advances in obser vational technologies, comput e rs and e c lipsin g 
binary (EB) analysis codes (e. g., WD code: IWilson and Devinneyl (Il97ll ): IWilsonI (119931 ): 
Wilson and Van Hammd (120071 )) have enabled the accumulation of an impressive sample of 
fundamental stellar data. Careful analysis of EB light curves has produced fundamental 
stellar properties, tests of stellar evolution theories, accurate distances within the Galaxy 
and to external g alaxies, as well as p roviding tests of stellar structure models and general 
relativity (see e.g. lGuinan et al.ll2007l ). Despite the importance of these astrophysical results, 
only some hundreds of EBs have been subject to the requisite analysis and the cumulative 
results populate the astrophysical parameter space sparsely. Current standard practice re- 
quires significant human interaction with EB light curve data, particularly in the initial 
stages of analysis, which defines a "rate-determining step" in the process generating these 
astrophysical results. 

By 2020, the observational bounty from gr ound- and space-bas ed programs such as Opti- 
cal Gravitational Lensing E xperiment (OGLE:IUdalski et al.lll997n . Experience de Recherche 
d'Ob.iet s Sombres 
(ASAS; 



Poimanski 



ERO S; iPalanque-Delabrouille et al. 



2002), Pan- Starrs (IKaiser et al.ll2002l ). Large Synoptic Survey Te l escop e 



19981 ). All Sky Automated Survey 



TvsonI 120021). and t heir space counterparts Hipparco s ( iPerryman and ESAl 119971 ) 



(LSST; 

Kepler ( iBorucki et al.l l2004j ) and Gaia (IPerryman et al. 



200 ll ) will include millions of new 



EB light curves, even catching some EBs in fleeting stages of stellar evolution. Powerful and 
mature codes for light curve analysis stand ready to mine this enormous and rich vein of 
new astrophys ical informat i on, and pioneering s t eps in automating t hese t ools have already 
been t aken by iDevorl (120051 ) ; iTamuz et al.l (120061 ) ; iMazeh et al.l (120061 ) (see iPrsa and Zwitter 
( 120061 ) for a concise overview of current automated techniques). A key issue is to effi- 
ciently determine an initial set of model parameters for every light curve as input to the 
automated analysis process. Starting values have been conventionally supplied by the ana- 
lyst / astronomer using expert knowledge typically guided by checks with light curve synthesis 
codes, a time-intensive process that is certainly out of the question for the coming fire-hose 
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of EB data. 



Approaches to automating; light curve solutions have taken various forms to date. 
Wyithe and WilsonI (j200ll . |2002| ). in their work to establish the best distance indicators 
among detached and semi-detached binaries in the Small Magellanic Cloud, obtained start- 
ing parameters for the rigorous WD model by comparing each candidate light curve with 
a set of template model light curves, sending the best match to an automated version of 
the WD differential corrector program DC. This could be computationally prohibitive to 
apply to the expected large future data-sets, even if clever pruning reduced the number of 
comparison templates. Empl oying less rigo rous physical models, of course, is one approach 
to computational efficiency. iDevorl (120051 ) provides a critical discussion of an automated 
pipeline employing a simple model of spherical stars without tidal or reflection physics, 
whose starting values are obtained from a n initial gues s and then refined using a downhill 
simplex method with simulated annealing. iTamuz et al.l (120061 ) employ the EBOP ellipsoidal 
model (jPopper and Etzell llQSll ). Using this engine, they arrive at initial solutions after a 
combination of grid search, gradient descent and geometrical analysis of the LC. 

The challenge is therefore to gain the advantages of a sophisticated model yet keep 
processing time limited. The Eclipsing Binaries via Artificial Intelligence (EBAI) project 
introduces artificial neural networks (ANNs) that are trained on the rigorous WD physical 
model and are computationally extremely efficient, towards fully automating the solution 
process for EBs. 

In this first paper in the EBAI series we describe the basic ANN concepts and pro- 
cedures for applying ANNs to detached EBs. We present results of applying the trained 
ANN to a set of 10,000 synthetic detached EBs, to 50 detached real world binaries from 
the Cata log and Atlas of Eclipsing Binaries (CALEbEI), and to the set of 2580 OGLE LMC 
binaries (IWyrzykowski et al.ll2003l ) classified as detached. Subsequent papers will deal sim- 
ilarly with semi-detached systems and overcontact EBs, and address automated light curve 
classification. 



2. The EBAI project: concept and implementation 

Advances in Artificial Intelligence (AI) and the continued operation of Moore's law that 
predicts doubling of the processing power every two years have created the opportunity for 
significant progress in solving the types of problems that are limited by the lack of human 
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capital. A new approach, the Intelligent Data Pipeline (IDP), is being prototyped in the 
domain of EBs which uses AI techniques to operate autonomously on large observational 
data-sets to produce results of astrophysical value. The IDP is designed to handle the 
complete process of variable discovery, c lassification of variabi l ity an d management of the 
solution process for the discovered EBs (jPevinney et al.ll2005l . l2006l : cf. Fig. [1]). The IDP 
employs ANNs in the processing modules, while the supervisory knowledge, now implicit in 
humans, is encoded in control modules as rules appropriate for each processing module. The 
supervisory modules have the task of keeping the process on track and providing physically 
meaningful results through each phase of the processing pipeline. This paper concentrates 
on the Solution Estimator block: given the set of classified input light curves, it finds a set 
of physical and geometric parameters that best match the input data. 



2.1. Artificial Neural Networks 

An artificial neural network is a simple construct: it is a stack of interconnected layers 
(cf. Fig. [2]). Each layer is an array of processing elements called units. These units propagate 
the signal between layers by weighted connections. The units perform non-linear mapping of 
input data to output parameters. 

The input to the network is a light curve - an array of photometric data points at 
equidistant phase intervals. These data are the stimulus of the input layer: each unit on the 
layer acquires a value of the given input array element. Once the input layer is populated, 
propagation to the hidden layer begins. The stimulus of the given unit on the hidden layer 
is a weighted sum of outputs from units on the current layer. This stimulus is then passed 
through a non-linear activation function that determines the extent of stimulation of the 
given unit. There is a connection from every unit on the input layer to every unit on the 
hidden layer, and each of these connections has a corresponding weight assigned to it. Once 
the signal has been propagated to the hidden layer, the propagation continues to the output 
layer: each unit acquires a value that is a weighted sum of outputs from units on the hidden 
layer, passed through the activation function. The network thus maps its input to its output 
by propagating the stimulus via weighted connections that are passed through activation 
functions. Given the layer-to-layer connection weights, the propagation is basically a matter 
of summation and multiplication. The power comes from the ability of networks to provide 
non-linear mapping between their input and their output by using non-linear activation 
functions. The most commonly used activation functions are sigmoid curves, i.e. functions 
of the type f{x) = 1/[1 + exp(— (a; — /i)/T)]. 

The goal of ANNs is to have output layer values that have some representative sig- 
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Fig. 1. — Intelligent Data Pipeline (IDP). Complete survey data is piped through a period 
finder algorithm that is controlled by a rule-based system. All variable sources are then 
passed to the ANN-based classifier. Light curves consistent with EB signatures are passed to 
the Solution Estimator block (focus of this paper). The output from the Solution Estimator 
is input to the sophisticated WD-based engine PHOEBE. For detached EBs the IDP produces 
a catalog of 5 principal parameters - T2/T1, p\ + p2, esinu;, ecoscj, and sinz. These are 
complemented by other survey /mission data and follow-up observations to obtain full-scale 
astrophysical results. 
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Fig. 2. — The topology of a three-layer Artificial Neural Network (ANN). Processing units 
(nodes) on the input layer are populated with flux observations that are equidistant in phase. 
A weighted sum of values on the input layer, yj = J2k '^fk^k, stimulates each unit, j = 1 . . . Z, 
on the hidden layer. The amount of stimulus is determined by the activation function . 
In our case this is a sigmoid function, Aj{yj) = 1/[1 + exp(— (y^ — /i)/r)], with parameters 
fi and T chosen in such a way that is mapped onto the [—1,1] interval. Processing units 
on the hidden layer are then populated with hj = A^{yj). An analogous propagation takes 
place between the hidden layer and the output layer, using the same type of the activation 
function. The output layer of a trained network then contains the network's best guess at 
physical parameters of the data that have populated the input layer. The ANN is thus a 
mapping from observations to the physical parameters of the observed system. 
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nificance of the input array. In our case the output is an array of model parameters that 
correspond to the input hght curve. Since the output values depend on the connection 
weights, the task is to determine these weights. This is where back-propagation comes into 
play. Assume we have a sample of several thousand exemplars (input arrays for which we 
know the corresponding physical parameters) - obtained, for example, by computing theo- 
retical light curves, or by using real data with reliable model solutions. For each exemplar we 
perform forward-propagation and compare the results output by the network with the true 
ones. We then modify the weights so that the discrepancy between the results is reduced 
for the whole sample. This iterative procedure is called the training or machine learning 
phase. It is the only computation-intensive block and it is performed only once. Once the 
weights are determined and the network reliably reproduces the expected results, the net- 
work is ready to recognize input never seen before. Hundreds of thousands of light curves 
can subsequently be processed in a matter of seconds on a single CPU. 

Depending on their construction, ANNs can serve either as classifiers or regression cal- 
culators. The classification mode features bins that are representative of a group of objects 
instead of continuous output parameters. The regression mode, on the other hand, imposes 
no limits on the domain of parameter values and provides a non-linear mapping between the 
input layer and the output layer. The IDP classifier is an example of the classifying ANN, 
whereas the solution estimator ANN (the main topic of this paper) is a typical regression 
ANN. 

The network described here, a basic three-layer back-propagating network (BPN), is re- 
markably robust for a diversity of non-linear problems. Different network topologies, such as 
multiple hidden layers, as well as more complicated connection strategies, advanced training 
approaches and other variations in general do not bring significant improvements to the ba- 
sic mod el. For a thorough discussion on neural networks please refer to authoritative books 



such as iFreeman and Skapural (1199 ll ) . 



2.2. Choice of ANN principal parameters 

Solving the inverse problem for EEs (determining the set of model parameters that 
best fit the observations) is notoriously difficult because parameter inter-correlations and 
degeneracy are inherent to the problem and cannot be avoided. Thus considerable care 
must be taken when selecting the network's output parameters. For example, if we focus on 
detached EB light curves, parameters such as the mass ratio and semi-major axis cannot be 
obtained. Consequently, absolute-scale parameters such as the radii and individual surface 
potentials cannot be obtained either. Moreover, if there is only one input light curve, there 
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is no information on the absolute values of individual components' temperatures. 

The role of ANNs is to provide estimates for parameter values as a starting point for 



specialized minimizatio n methods, such as NMS or Powell's method (jPrsa and Zwitterll2005 



Prsa and Zwitteii 120071 ). therefore all second order parameters such as gravity darkening, 
reflection effect, or spots, can also be ignored. In fact, because of degeneracy it seems 
that, out of 40 or so parameters involved in computing a light curve, there is virtually no 
parameter that would be ideal to appear on the ANN's output list. To stay true to the idea 
of artificial intelligence, and to overcome these difficulties, we must take a step back: what 
guides humans to recognize (and implicitly classify and analyze) a light curve? ANN is a 
recognition algorithm, so it has to connect to that which triggers our recognition capability. 

What guides humans to recognize light curves are the ratios and the geometry. Accord- 
ingly, we sought a set of parameters that would act as a bridge between the features that 
trigger our subjective recognition, and physical and geometrical parameters of the system. 
Fig. [3] schematically presents the shape of a light curve and the parameters that determine 
such a shape. Namely, the surface brightness ratio B2/B1 determines the ratio of depths of 
both eclipses; the sum of relative radii pi + p2 determines eclipse width; eccentricity e and 
the argument of periastron u determine the separation between the eclipses and also the 
ratio of their widths; finally, the inclination i determines the overall shape of the eclipse - U 
vs. V-shaped minima - and the overall amplitude of the light curve. We have selected these 
indicators to form the five principal parameters of the ANN. They are summarized in Table 

m 



Effective temperature ratio. Although the ratio of depths of both eclipses depends di- 
rectly on the surface brightness ratio and is thus a complicated function of temperature, 
we approximate it here with the temperature ratio. The rationale behind this choice is 
threefold: 1) the equivalence between the temperature ratio and the surface brightness 



Table 1: Ranges of principal parameters for detached eclipsing binaries. 



Parameter: 


Range: 


Description: 


a 


= T2/T, 


[ 0.1, 1.0] 


effective temperature ratio 


/3 


= Pl+ P2 


[0.01, 0.5] 


sum of relative radii, p = R/a 


7 


= e sin 


[-0.3, 0.3] 


radial eccentricity 


6 


= e cos u 


[-0.3, 0.3] 


tangential eccentricity 


e 


= sin i 


[0.85, 1.0] 


sine of the inclination 




Fig. 3. — Principal parameters of a light curve that bridge our subjective recognition with 
impersonal parameters of the EB system. 
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ratio increases as the deformation of stellar surfaces decreases and this approximation 
is thus valid for well detached binaries; 2) there exists a direct mapping from local 
temperature distribution over the stellar surface to surface brightness, but there is no 
direct mapping from surface brightness to local temperatures. To obtain the tempera- 
tures (that are input parameters to the model) from surface brightness ratio would thus 
imply a computationally intensive iterative scheme that would significantly prolong the 
overall run-time; and 3) since we are basing our analysis on a single light curve, there is 
no inherent way to calibrate the temperatures and thus no way to obtain the effective 
temperature of the binary. As consequence, even if we used surface brightness as the 
network's output parameter, we would still have to assume one of the temperatures 
and therefore still suffer from systematic effects. 

Sum of relative radii. The absolute scale of the system cannot be unambiguously ob- 
tained from a single light curve, nor can we typically get individual fractional radii. 
The reason for the latter is the degeneracy between the radii: any combination can 
reproduce the curve as long as their sum is kept constant. This degeneracy is broken 
in special cases, i.e. total eclipses, pronounced proximity effects, or eccentric orbits, 
since the signature of individual radii is predominantly contained in the ingress and 
egress shape of the eclipse. For close detached, semi-detached, and ovcrcontact systems 
this limitation does not exist. In general, however, the only quantity attainable from 
the width of the eclipses of well detached EBs is the sum of relative (fractional) radii 
Pi+P2 = (i?i + i?2)/a- 

Radial component of eccentricity. Were the eccentricity e and argument of periastron 
u) to enter the training separately, problems would arise at small eccentricities; e.g., as 
e gets smaller, uj becomes indeterminate. Furthermore, the orthogonalized components 
e sin UJ and e cos uj have distinct effects on the shape of the light curve. Radial eccen- 
tricity esino; is directly proportional to the ratio (^2 — di)/{d2 + di) of the durations 
of both minima. 

Tangential component of eccentricity. The tangential orthogonahzed component e cos a; 
is directly related to the displacement of the minima. This component is a first order 
effect, whereas the echpse width change is a second order effect. 

Inclination. Instead of using the inclination itself, it proves better to use sini because of 
its geometric significance, that is the angle analogous to the sum of fractional radii 
that determines the shape and duration of the eclipse. 

For the purpose of ANN, these parameters are hnearly remapped to the [0.1, 0.9] interval, 
values that are suitable for the activation function. 
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2.3. The training data-set 

The goal of the EBAI project is to rapidly process large amounts of survey data. If 
the network is to perform successfully, the training sample has to be representative of those 
data. Although survey results are impressive because of the sheer number of the observed 
targets, they lack in accuracy, data diversity (i.e. simultaneous photometry, spectroscopy, 
astrometry) and often have poor phase coverage. We typically have a single photometric 
light curve in instrumental magnitudes with errors as great as ~ 5%. There may be fewer 
than ~ 100 data points, and because of the typical vagaries of survey/mission operation 
(scanning laws) these do not cover the phase range uniformly. Well detached EBs might 
thus have only a few points in the eclipses, with practically flat quadratures. It is important 
to be aware of these limitations in the observed data, since we cannot expect ANNs to do 
more than a human could do interactively. From a poor light curve there remains a very 
limited amount of information that can be extracted reliably. 

Our training data-set is a representative version of the expected realistic data described 
above. We created an initial sample of 33,235 normalized light curves in the Cousins I 
band, each consisting of 200 data points distributed equidistantly in phas e, and with added 



synth etic white noise. We used a Monte Carlo-based script in PHOEBE (jPrsa and Zwittei 



20051 ) that randomly selected the values of parameters of detached eclipsing binaries ac- 
cording to the distribution functions depicted in Fig. HI For eccentric binaries a phase shift 
A$ = (Mc + uj — l/4)/27r, where is the mean anomaly at conjunction, was introduced 
so that the primary eclipse coincides with phase 0. The number of created light curves 
(33,235) was chosen so that it is close to 8^ = 32768 and a multiple of 23 to optimize use of 
a 24-node Beowulf cluster. The first criterion means that the 5-parameter grid resolution is 
~ 8x the interval width: the determination of the argument of periastron, for example, has 
a resolution of 27r/8 = 7r/4. Beyond that, the accuracy is limited by the ANN interpolation 
error. Of all parameters, the limiting interpolation capability of the network affects only 
the argument of periastron since its signature is present in only a small subset of the whole 
sample (EBs with significant eccentricity). Tests have shown that increasing the number 
of exemplars improves the determination of the argument of periastron significantly. Other 
parameters do not depend as strongly on the number of exemplars. 

Although ANNs depend only weakly on the distribution functions of exemplar param- 
eters, some care should be taken to avoid systematic effects due to the inherent degeneracy 
of EB light curves. For example, the initial training sample was created uniformly across 
all parameters and, although ANN performance was reasonable, it exhibited clear system- 
atics in cases where light curves are not sensitive to parameter variation. This was most 
noticeable for eccentric binaries with arguments of periastron close to ±7r/2: eccentric orbits 
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with minima separated by half the orbital period statistically overwhelmed circular orbits, 
causing the network to bias eccentric binaries with semi- major axes aligned with the line of 
sight to the circular binaries. To reduce this and similar effects, the distribution functions 
are selected so that systematics due to degeneracy are minimized. Thus, a follows a gaus- 
sian distribution function 7\^(1.0, 0.2); (5 is sampled uniformly between 0.01 and 0.5, with the 
ratio of fractional radii distributed by AA(1.0, 1.3); 7 and 5 follow an exponential distribution 
£^(0,0.05) in e and a uniform distribution in uo; finally, e follows a uniform distribution in 
i with the built-in cut-off point = igrazing — 1°, where grazing is the critical inclination 
for e clipses to occur. Li mb darkening values are interpolated for each generated light curve 
from Ivan Hammd (119931 ) tables, and gravi ty darkening coefficient (3 is set to 1.0 for radiative 
envelopes (Teff > 7500 K; Ivon Zeipellll924j ) and 0.32 for convective envelopes (Tcfr < 7500 K; 
Lucylll967l ). A canonical value of the mass ratio q = 1.0 is used for all curves. In the absence 



of other types of data (spectra or RVs), the effects of the degeneracy can never be completely 
removed, only reduced. 



2.4. Fitting a smooth curve to observed data 



The network expects its input to be given at 200 equidistant phases ranging from —0.5 
to 0.5, with phase corresponding to the primary (deeper) minimum. To achieve this, 
the observed data points must be represented by a smooth analytic function that can later 
be sampled in equidistant phase points. In the special circumstances - complete/uniform 
phase coverage and low noise, the data can simply be interpolated or binned into normals. 
Unfortunately, neither condition is usually satisfied by survey data. The task of fitting a 
smooth curve to such observed data proved quite challenging: common fitting functions 
like Fourier series, orthogonal polynomials (Legendre, Chebyshev, etc) , or splines do a poor 
job on real-world detached binary light curves. Due to the function regularity requirement 
(functions must be connected and differentiable in every point), data noise induces strong 
oscillations in the fitting functions, making them virtually useless (cf. Fig. E]). We therefore 
devised a dedicated algorithm for fitting a smooth curve to detached binary data. Our 
algorithm polyfit fits a polynomial chain V{x) of piecewise smooth n-th order polynomials 
connected at a given set of knots. The two key ideas of the algorithm are 1) to abandon the 
requirement of differentiability of V{x) at knots and thus allow the polynomial chain to be 
broken, and 2) to vary the position of the knots iteratively and thus relax the system to the 
nearest minimum. The formal description of the algorithm with the implementation details 
are given in Appendix \M Fig- M shows some examples of the algorithm's performance on 
OGLE data-set. 
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Fig. 4. — Distributions of the principal parameters {a through e; left panels) and of the 
corresponding physical parameters (Ti through i; right panels). A Monte-Carlo generator 
was used to create a training set of 33,235 synthetic light curves with parameters sampled 
randomly, according to these distribution functions. 
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Fig. 5. — Examples of a cubic B-spline fit to OGLE data. In the first case (128 knot spfine) 
the overaU shape of the hght curve is reproduced, but at the expense of large oscillations. In 
the second case (32 knot spline) the fit to eclipses is deteriorated, and the oscillations still 
remain. Splines proved not well suited for fitting detached EB light curves. 
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2.5. ANN sensitivity to light levels 

The generated exemplar light curve fluxes are normalized to the sum of both compo- 
nents' passband luminosities per steradian's worth of area, namely /norm = 1 = (-^i + -^2)/4vr, 
which assigns unity flux to spherical stars of the given luminosity. Thus, the quarter-phase 
flux will be unity for well detached binaries and close to unity for deformed stars because 
of ellipsoidal variations and reflection effect. The rationale behind this choice of normaliza- 
tion lies in the nature of observed light curves: were we to normalize synthetic light curves 
to maximum light, we would have to do the same for observed light curves, a poor choice 
because we would be normalizing to outliers in the observed data. 

The question is, then, how to normalize observed data so that light levels are consistent 
with those of the exemplars. To do that properly, we would need to know the amount of 
distortion of each component, an unknown which the network is supposed to yield. It turns 
out that this apparent problem is not a problem at all: neural networks predominantly learn 
the shape of the light curves rather than normalization. Fig. [7] depicts the dependence of 
the sum of squares of residuals on light level for 5000 exemplars. Varying each exemplar's 
normalization level according to the given a (ranging from 0% to 50%), the data were 
propagated through the network and the change in the sum of squares of residuals was 
observed. Since we normalize all light curves to 1.0 at quarter-phase, this introduces ~5% 
error in normalization for signiflcantly distorted binaries. This error corresponds to less than 
0.2% in the sum of squares, which in turn corresponds to less than 0.01% in parameter errors, 
negligible for all practical purposes. 



2.6. Selection of the optimal ANN topology 

Three-layer back-propagation neural networks are w ell known for their robustne ss and 



applicability on a wide range of computational problems (IFreeman and Skapuralll99ll ). EBs 
proved to be no exception: tests with two or more hidden layers had shown that there is no 
obvious beneflt for convergence stability. The real challenge was to determine the optimal 
sizing of the network topology. The input layer is constrained by the input data (200 data 
points at equidistant phases), and the output layer is constrained by the output parameters. 
Thus, the topological parameter to be explored is the number of hidden units. We trained 
the network identically for different numbers of hidden units (Fig. [8]), which yielded the cost 
function value for individual parameters (solid bars) and the complete output layer (outline 
bar) as a function of the number of hidden units. With few hidden units, there is insufficient 
freedom for the network to learn to map esincj and ecosu, whereas too many hidden units 
cause the network to oscillate by learning noise traits. The optimal choice corresponds to the 
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Fig. 7. — Neural network performance on randomly normalized data. The depicted range 
oi a — 0-50% (0-6% at the inset) in random displacement clearly shows that the network 
performance is independent of the input normalization level: 5% error in normalization 
propagates to less than 0.2% in the sum of squares of residuals, which in turn implies even 
smaller errors in terms of parameter accuracies, negligible for all practical purposes. The 
initial decreasing trend in T2/T1 and ecosa; is a consequence of the 0.5% jitter added to the 
exemplars during the learning phase. 
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number of hidden units where the cost-function dependency levels off, at around 40 units. 
This value was selected for the final network topology. 

The network learning rate parameter a determines the step size along the direction 
of the negative gradient during back-propagation. If a is too small, learning requires an 
increased number of iterations. If it is too large, the correction overshoots the oscillates 
about the target value. Fig. [9] depicts learning curves for a ranging from 0.01 to 2.0. The 
value a = 0.1 proved best for the adopted network topology, and it was held fixec^l during 
learning. 

3. ANN performance and results 

The principal benefit of ANNs is speed; a trained network typically takes seconds on an 
average computer to process even the largest EB databases. However, the most important 
question is the reliability of the parameters reproduced by the network. After training 
the network with 33,235 exemplars, it was tested on a distinct synthetic sample of 10,000 
detached EB light curves. The second test was performed on the CALEB database of 
real-world binaries, where we extracted 50 detached EBs and passed them through the 
network. For these we could compare the known values of model parameters from the 
sample with parameters yielded by the network. After making sure the ANN performance 
was satisfactory, we submitted 2580 detached EBs from the OGLE database for which we 
had no prior solutions. We present the results of the analysis below. 

3.1. Network training 

All training light curves were pre-processed by the polyfit algorithm. Although the 
equidistant phase requirement can be easily achieved for synthetic data, neural networks 
perform poorly if the training set is fundamentally different from the unknown data-set 
despite their apparent similarity. This requirement was met by fitting a polynomial chain 
to model light curves and training the network on such a pre-processed sample. Another 
merit of pre-processing the sample is a reduction of systematic error: if a polynomial chain 
fit exhibits systematic effects on an observed light curve, it will likely exhibit the same 
systematic effects on the model light curve. 



^ There are back-propagation training algorithms that increase the learning rate parameter with the 
number of iterations, but we found no obvious necessity for implementing such a strategy. 
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Fig. 8. — Network performance as a function of the number of hidden units. Filled bars 
correspond to the sum of squares of the residuals for individual parameters {a: green; (5: 
blue; 7: purple; 5: cyan; e: yellow) of 33235 exemplars. The outlined red bar depicts a 
rescaled sum of squares of the residuals for all 5 parameters, i.e. the actual cost function of 
the network. The network performs reasonably well for 25 to 65 hidden units; given that the 
training time cost grows quadratically with the number of hidden units, the smallest still 
satisfactory number should be adopted. In accordance with the above diagram and following 
the detailed analysis of the residual trends (deviations from the normal noise distribution), 
we set the number of hidden units to 40. 
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Fig. 9. — Comparison between learning curves for different learning rate parameters (a = 
0.01, 0.1, 1.0, and 2.0). For small values of a the learning curve is smooth, but requires 
additional iterations to converge. For large values of a the learning curve oscillates as 
parameter values overshoot the targeted minimum. After a comprehensive search over a 
fine-grained range of learning rate parameters (only 4 are depicted here for clarity) we find 
that the optimal value is a = 0.1. 
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The back-propagation stage (ANN training) is the most computationally intensive block 
of the algorithm. It is inherently serial, since the weight matrices are updated one exemplar 
at a time. Given the network topology and a large number of exemplars, it would typically 
take a very long time (days, even weeks) to calibrate and train the network adequately. For 
example, the time required to train a network on a single 2GHz processor (250 s for 1000 
iterations on a single CPU, thus ~1.45 days for 500,000 iterations; cf. Figs. [TOl [11]) equals the 
time required to compute ~12,500 eccentric light curves (~10s per eccentric light curve com- 
putation). Yet this comparison does not take into account the overwhelming computational 
time to calibrate network topology (in our case we trained at least 50 different networks), 
which in most circumstances dominates the overall processor time. To overcome this, we de- 
vised a parallel version of the back-propagation algorithm that partitions the sample among 
available processors in a Beowulf cluster. Each processor performs back-propagation on a 
chunk of the sample, and communicates the locally modified weight matrices back to the 
main unit. The main unit then assembles the final matrices by weighted averaging and 
broadcasts them to compute nodes for the next iteration. If the chunks are sufficiently 
large, there will be only a slight penalty for not performing a rigorous gradient computation. 
Moreover, there is an unexpected benefit: the learning process is better able to avoid local 
minima because of the perturbation of the steepest descent. Fig. [TOl shows the net speed-up 
as a function of the number of processors: running on 24 nodes, the speed gain is ~ 15 x, 
reducing 2 weeks worth of processor time to less than 1 day. 

The network with 200 input nodes, 40 hidden nodes, and 5 output nodes was trained 
for 500,000 iterations with a learning rate parameter a = 0.1. Variable white noise with 
a ranging from 0.005 to 0.02 was added to the sample of 33,235 synthetic light curves. 
Exemplars were created from these light curves by polyfit pre-processing. Fig. [11] shows the 
learning curve: sum of squares of the residuals on the output layer as function of iteration 
number. The curve is monotonically decreasing, with the convergence rate falling below 
10~^/iteration at around the 470000-th iteration. We tested convergence stabihty for the 
total of 10 million iterations and noted no deterioration of the fit (i.e. from over-training) nor 
significant improvement of the adopted result. Fig. [12] depicts the transformation matrices 
from input to hidden (I— >H) layer (left), and from hidden to output (H— >0) layer (right). 
The I— i>H matrix determines general mapping properties and projects eclipse patterns clearly; 
it does not change substantially after the first 10000 iterations. The H— i>0 matrix serves 
for fine-tuning; it continues changing as iterations proceed and shows no clear patterns that 
would relate to either layer in any obvious way. The results of training are depicted on 
Fig. [13] The following traits for each parameter may be deduced from this figure: 

T2/T1. Since surface brightness ratio, which is directly related to the ratio of depths of both 
eclipses, only approximately relates to the temperature ratio, it is no surprise that 



-22- 



T3 

e 
o 
o 



c 
o 



o 
o 
o 



Oh 

o 



250 
200 
150 
100 
50 




■ 



5 10 15 20 

Number of processors used (MPI coupling) 



25 



Fig. 10. — Back-propagation speed-up resulting from multi-processor parallelization. The 
framework used is Message Passing Interface (MPI) as implemented by the OpenMPI li- 
brary (http:77www.open-mpi.org). The algorithm was implemented on a dedicated 24- 
node 2GHz Opteron cluster. The net yield is ~ 15x speed-up with respect to the serial 
uni-processor implementation. 
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Fig. 11. — Learning curve. The number of iterations on the main plot is depicted in log scale. 
Cost function is normalized per exemplar, i.e. Ylp=ii^p ~ ^pf' i where N is the number 

of exemplars, Op and Cp are input and output values of parameters, respectively, index i goes 
over all exemplars and index p goes over all output parameters. The inset depicts the final 
100,000 iterations on a linear scale. 
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Fig. 12. — Weight matrices. The I— >H matrix (left) plays a dominant role in mapping general 
LC features. There are pronounced peaks around the 100-th unit that corresponds to phase 
0, and around the 0-th and 200-th units that correspond to phase 0.5. It is interesting to 
observe the transverse patterns in the I— >H matrix, indicating that the network not only 
learns the phase position of the LC features, but also the variation of slope along the phase. 
The H— s>0 matrix (right), on the other hand, serves for fine-tuning of the mapping. It 
does not exhibit any obvious form and the weights change throughout convergence both in 
magnitude and in distribution. 
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Fig. 13. — The results of neural network training on 33,235 exemplars - light curves gener- 
ated by the WD engine. The correlation between input values and recognized output values 
is shown for each parameter and offset by 0.5 for clarity, with parameter values being linearly 
mapped to the [0.1,0.9] interval for activation function efficiency. Labeled guidelines repre- 
senting ideal performance are provided for easier comparison. Strong systematics are evident 
in the T2/T1 parameter, due to the poor approximation of the surface brightness ratio by 
T2/T1. Fortunately, temperature ratio is the parameter with most rapid DC convergence 
(demonstrated in Section 13. 3p and a single DC iteration is usually necessary to correct for 
this problem. The remaining trends and areas of increased scatter are either statistically 
insignificant (< 5% in 90% of all cases; cf. Fig. [H]), or are caused by a well-understood 
degeneracy that is discussed at length in the text. 
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this parameter was the toughest to be reproduced by the network. Most systematic 
error occurs near T2/T1 ^ 1, where the model would expect equal depth eclipses. 
These of course depend also on the ratio of radii and on the radial component of 
eccentricity, so high scatter at the upper end of the distribution is not surprising. 
However, temperature ratio is the parameter with the most rapid DC convergence and 
it often takes only a single DC iteration to significantly improve it. For this parameter 
49% of the sample was reproduced to 2.5% accuracy (cf. Fig. [T^ . 

Pi + P2- Neural network convergence was most rapid for the sum of fractional radii, cca. 1000 
iterations required for ANN training. This arises because the eclipse width for detached 
binaries is a well behaved parameter, i.e. the deformation of stellar surfaces does not 
have a significant influence on the duration of the eclipse. The fall-off at the lower end 
can be attributed to phase coverage: given the number of equidistant phase points, for 
wide EEs there are only a few pixels in the minima and these do not suffice for reliable 
discrimination. The upper end deviation, on the other hand, is due to degeneracy 
between the radii (surface potentials) and the inclination: the effect of slightly enlarging 
the radii or by slightly increasing the inclination on eclipse width is indistinguishable. 
For this parameter 81% of the sample was reproduced to a 2.5% accuracy. 

esina;. The geometric significance of the radial component of eccentricity is in the ratio of 
the durations of both eclipses and is thus a second order effect. Although the network 
does learn the trend for weakly eccentric binaries, it systematically fails for strongly 
eccentric binaries. There are two reasons for this: 1) very eccentric orbits imply widely 
detached EEs typically with poor eclipse phase coverage, and 2) the number of cases 
with e > 0.2 in our training sample is only ~ 5%. We thoroughly tested the latter 
issue by training the network with 5000, 10,000, 20,000, and 33,235 exemplars; the 
improvement in the determination of esincj was observed as anticipated. The phase 
coverage issue could be solved by increasing the number of phase points, but given 
the scatter in survey data-sets we anticipate the loss of accuracy on this account to be 
minimal. The bunching at mid-range is expected, as this corresponds to near-spherical 
orbits, and the majority of systems are in this range. For this parameter 75% of the 
sample was reproduced to a 2.5% accuracy. 

e coscij. Contrary to the radial component, the tangential component of eccentricity is better 
determined by the network. Its geometric significance lies in the phase displacement 
of both minima and is thus a first order effect. This parameter does not suffer from 
any obvious systematic effects. For this parameter 91% of the sample was reproduced 
to a 2.5% accuracy. 

sinz. The inclination proved to be recovered reliably for all practical purposes. The lower 
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end discrepancy is of no practical importance because it corresponds to grazing eclipses 
and non-eclipsing binaries that constitute ~1% of the sample. The increased scatter 
on the upper end is due to the degeneracy between the radii and inclination already 
discussed. For this parameter 88% of the sample was reproduced to a 2.5% accuracy. 



3.2. Test 1: Synthetic sample of 10,000 EBs 

As a first test we evaluated the ANN's performance on a sample of 10,000 detached 
EB light curves distinct from the training set. The test sample was created using the same 
method as the training sample, following the same probability distribution functions for 
parameter values (cf. Fig. Hj) and adding variable amounts of white noise randomly chosen 
between a = 0.005 and a = 0.02. The sample was then pre-processed with polyfit to obtain 
polynomial chain fits to the data. Fig. [15] shows the results of this test. The success rate 
of recognition is comparable to that of the learning sample (71% of T2/T1, 95% of pi + P2, 
89% of esinu, 96% of ecosa;, and 96% of sini are determined to better than 5%), and 
the underlying distribution of errors remains the same. This demonstrates the capability of 
the ANN to successfully recognize data it has never encountered before. In the worst case 
scenario (see cumulative inset in Fig. [T5l) the network yields parameters accurate to 10% 
for 90% of the sample. This implies that the ANN output on unknown data is viable for 
statistical analysis and as input to sophisticated modehng engines for fine-tuning. 



3.3. Test 2: 50 real-world stars from the CALEB database 

Further testing of our approach to automated light curve solution - using a trained 
ANN followed by optional differential corrections - on real-world detached binaries from 
the CALEB database. Manual inspection of 59 candidate light curves eliminated 9 as ei- 
ther semi-detached or overcontact, leaving 50 test binaries. Of these, 7 stars had published 
model parameters that did not exactly reproduce the observed LC, which is not surpris- 
ing given the heterogeneous nature of CALEB data. For each binary, the chi-squared 
value of the model light curve generated from the ANN output parameters determined 
whet her 0, 1, 2, or 3 passes wo uld be made through the differential corrections program 



DC (jWilson and Devinneyi Il97ll ). Table IX^ shows the results. For 22 systems, the ANN 
output generated sufficiently good fits that DC was not invoked. 1 DC iteration was per- 
formed for values between 1.5 and 2.0 (16 systems), 2 DC iterations for values between 
2.0 and 2.5 (5 systems), and 3 DC iterations for values of 2.5 or higher (7 systems). Each 
of the remaining systems for which DC was invoked converged. Interestingly, for 7 of these 
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Fig. 14. — Error distribution of ANN output parameters for the training sample. The bars 
depict the fraction of all EBs with errors between 0% and 2.5% (first bin), 2.5% and 5% 
(second bin), etc, for individual parameters. The ecoscj parameter has the highest success 
recognition rate, and T2/T1 has the lowest success rate because of its approximate B2/B1 
behavior. The inset shows the cumulative distribution for the same data: 90% of all cases 
have errors in all parameters smaller than 10%. 
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Fig. 15. — Neural network recognition performance on 10,000 unknown synthetic LCs. The 
test sample was created using the same method as the training sample. Left: comparison 
between input and output values of individual parameters. Comparing against Fig. [13] shows 
that not only is the success rate of recognition comparable to that of the training set, but it 
also has the same underlying features identified and interpreted in the training phase. Right: 
distribution of differences (main graph) and their cumulative distribution (inset). Again, the 
same distribution as that of the training sample is seen. 



systems the published parameters were discrepant with the results here, which suggests an 
additional benefit to a uniformly-based automated analysis. 



Table 2. Comparison between CALEB values of parameters {ac through e^), ANN values of parameters {a^ 
through ejv), and DC- improved values of parameters where applicable {aoc through eoc)- 
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The results for 50 CALEB stars seem promising: the network managed to get a sat- 
isfactory starting point for all LCs, and DC converged successfully to the final solution. 
Comparing ANN results against published solutions cannot be done rigorously since, as 
stated previously, there are systematic discrepancies in the published data; inspecting Table 
13.31 and LCs given in Appendix [B] reassures that the network, powered by DC, was capable 
of getting probable parameters for 50 out of 50 CALEB test stars. 

For the continuation of this test we initiated an undergraduate project at Villanova and 
Eastern Universities to review, enhance, and extend the CALEB database with modern data. 
This ambitious project aims to collect LC and RV curves, spectra, and model solutions for 
the total of ~1000 EBs. That will be roughly a factor of 3 of improvement when compared 
to CALEB. This database will be publicly available. 



3.4. Application: 2580 stars from OGLE detached EB catalog 



Wyrzykowski et al.l (120031 ) published a catalog of 2580 detached EBs detected in a 4.6 
square degree area of the central parts of the Large Magellanic Cloud (LMC) which includes 
photometric data collected during the OGLE-2 microlensing search from 1997 to 2000. The 
sample consists of 2681 entries d ue to field overlap. The samp le was previou sl y use d for 
statistical stu d ies of EBs, e.g. by iMichalska and Pigulskil (120051 ). iMazeh et al.l (120061 ) and 
Faccioli et al.l (120081 ). We took the sample in its entirety, without pre- filtering (no false 
positive removal, no whitening). Polyfit was applied to obtain analytic approximations to 
light curves and compute equidistant data points for input to the trained ANN. The ANN 
processing time of the OGLE sample on a 2GHz processor was under 1 second. Fig. [TBI shows 
the statistical distribution of the obtained parameters by ANN. 

The following properties of the OGLE sample are observed: 



the distribution of jS = pi + p2 shows a selection effect in the sample: the number 
of data points per light curve, coupled with the classification scheme used to gener- 
ate the model (^namely im age recognition on a rasterized/degraded light curve 
Wyrzykowski et al.l (120031 ) for details), favors close contact binaries; 



see 



the distributions of eccentricity e and argument of periastron u are coupled. There is 
an apparent deficiency of systems with near zero eccentricity (only 8% instead of the 
expected ~30%) on account of the excess of low eccentricity systems with arguments 
of periastron close to 90° and 270°. In these cases the line of sight is aligned with 
the semi-major axis and the eclipses are exactly half a period apart, and the sum of 
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Fig. 16. — The distributions of parameters obtained by the ANN from the OGLE sample of 
2580 LMC binaries classified as detached. 
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excesses exactly matches the deficiency in zero eccentricities. This is not a systematic 
effect] rather, the inverse problem is inherently degenerate - there is no information 
in the light curves of detached EBs that would allow discrimination between these 
systems. 

the distribution of e = sin i closely follows the expected distribution, with the exception 
of a slight deficiency of systems close toi = 87°. Given that higher inclinations typically 
correspond to total eclipses, and smaller inclinations correspond to partial eclipses, 
there is no solid handle for systems close to this boundary: these can be either total 
or partial eclipses and the network cannot discriminate. This effect is again inherent 
to the problem, t his ti me the degeneracy is between the inclination and radii (see 



Prsa and Zwitted (120051 ) for a thorough discussion of this degeneracy). 



the most interesting distribution is the temperature ratio a = T2/T1. There are two 
obvious contributions: a half Gaussian-like distribution with the center around T2/T1 = 
1, and a distinct excess superimposed at approximately T2/T1 = 0.5. Careful analysis 
has shown that the T2/T1 excess has an exact counterpart in the (3 distribution, between 
0.4 and 0.5. There is no correlation observed between T2/T1 and eccentricity (which 
could in principle exist because of the influence of eccentricity on eclipse depth), nor 
in fact with any other attained quantity except pi+ p2- We also ascertained that there 
is no similar excess in a synthetic sample. The source of the hump around T2/T1 = 0.5 
is not entirely clear; it could be due either to the selection effect of the OGLE sample 
(i.e. semi-detached and overcontact EB contamination of the sample), or be caused by 
a real physical duality in the sample. Either way this result merits further study. 



The obtained parameters are not flnal. The next step would be to feed the ANN results 
into the light curve solver, similar to what was done for the CALEB data presented in Section 
13.31 Since this paper focuses on the performance of the ANN tier alone, we defer a more 
thorough analysis of the OGLE data to a follow-up paper. 

The results, although rudimentary and uncorrected for sele ction effects, a llow c ompar- 
ison with the extensive study of LMC short period binaries by iMazeh et al.l (120061 ). They 
draw their sample from the same pool of 2580 EBs, trimmed by a detectability and quality 
factor. Their data were limited to: periods less than 10 days, /-band magnitudes between 
17 and 19; binary components on or close to the main sequence, yiel ding a sample of 938 
binaries. Our histogram of the sum of radii (cf. [161 top right) matches iMazeh et al.r s result 
well and generalizes it to a certain extent because of the unfiltered sample and the underly- 
ing Roche formulation, as it properly accounts for distorted stars. We also find the bimodal 



histogram in surface brightness (in our case its proxy T2/T1; top left), which iMazeh et al. 
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attribute to short period, presumably mass -transferred b inaries. Note that this feature is 
present even after their selection correction. iMazeh et al.l do not use the inclinations in their 
statistical analysis. As noted earlier, our inclination distribution (bottom right inset) is 
somewhat hindered by the degeneracy, but its overall statistical significance should be valid 
for the whole OGLE LMC sample of EBs. 



3.4- 1- Inferential science 

EB parameters gathered from large surveys can yield statistical insight into the physics 
governing their distributions. Since our tests of ANNs are shown to provide statistically 
viable results, the question is what scientific results can we extract from the sample. Even in 
the absence of thorough analysis with heuristic error estimates, additional data and dedicated 
follow-ups, the science extracted from large numbers of automatically analyzed LCs can 
nevertheless be important because we learn physical properties by inference from the whole 
sample. 

An example of inferential science is the correlation between the sum of fractional radii 
and eccentricity. It is well known that circularization and synchronization mechanisms de- 
pend strongly on the relative separation between stars due to tidal forces, torque, magnetic 
coupling, etc, leading to a correlation of pi + p2 with e. Fig. [T7] (left) shows the results 
for the OGLE sample: for each binary, EB parameters yielded by the network were fed 
back to the modeling engine, a synthetic LC computed and passband luminosity adjusted 
to match the data. The value, computed from the residuals was used to qualify the fit: 
diamonds in Fig. [T7] represent solutions with ^ 1-3, whereas pluses represent solutions 
with > 1-3. The solid line is an exponential function fit to the data so that points with 
up to 3a deviation lie below and points with up to la deviation lie above the boundary. The 
fits are systematically poorer at lower values of pi + p2 due to under-sampling in very narrow 
eclipses and the under-sampling of the network. Fit degradation can also be observed above 
the boundary due to contamination of the detached OGLE database with close-to-contact 
and overcontact systems that the network was not trained to recognize. Fig. [17] reveals a 
strong correlation between pi + p2 and e. Tidal circularization would suggest that stars 
close to the boundary are young, with age increasing gradually toward the bottom of the 
plot. The exponential-like dependence is intriguing and warrants comparison with a more 



rigo rous model to deriv e specific results for circularization rates. iNorth and ZahnI (120041 ) 



and iMazeh et al.l (120061 ) observed the dependence of lecoscjl on the relative radius p; the 



equivalent plot for AN N data with < 1.3 is depicted on Fig. [T7| (right). The results of 



North and ZahnI (120041 ). performed on a filtered sample of nearly equal B-type components. 
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indicate a linear dependence of lecoscjl on p, with a sharp cut-off pc at the onset of which 
all orbits ap pear circularized . The solid line in Fig. [T7| represents the occurence boundary 
observed by North and Zahru (120041 ) and a dashed line represents the boundary observed by 



Mazeh et al.l (l2006l ) . Our data show remarkable agreement in the estimate of pc-, but pos- 
sibly suggest a somewhat shallower linear dependence than that presented by the previous 
studies. This might be a consequence of approximating (p) =13/2 for the whole OGLE 
sample and will be studied more thoroughly in future. 



3.5. Discussion 



The EBAI project is motivated by the goal of maximizing the scientific output resulting 
from the analysis of the future fire-hose of binary star data. For this, we are committed to 
the application of a rigorous physical model (here, the WD model) that encompasses the 
complete range of binary star configurations. Moreover, we want no efficiency penalty for 
this, and ANNs are a practical means to this end, capable of performing a forward pass of 
thousands of observed light curves through the trained network in a fraction of the time 
required to compute a single theoretical light curve! While this paper focuses on detached 
EBs, the method is readily extendable to systems with large proximity effects as fully treated 
in the WD model and will be discussed in a subsequent paper. We will thus be able to apply 
ANNs in a unified, a utomated ap proach to solutions for eclipsing binary light curves of all 
classes. By contrast, [DevoiJ (120051 ) 's solutio n approach is time e fficient, but hmited to well- 
detached systems with no proximity effects. iTamuz et al.l (120061 ) 's approach employs a more 
realistic EBOP model, which is still limited in handling proximity physics. Nevertheless, 
these pioneering efforts have indeed shown their value in applications to large survey data. 
A computational penalty, if any, for using more sophisticated models will eventually vanish, 
thanks to Moore's Law. 

The benefits of ANNs here are not limited to speed, bu t include a non- linear inter- 
polation capability widely claimed to be robust to noise (e.g. iFreeman and Skapuralll991 



pg. 246). Characteristically for ANNs such a property is difficult to quantify, but our expe- 
rience supports this claim. Preliminary tests have been done on light curves with significant 
secular variations, most n otably due to chrom ospheric activity and proximity effects. One 
such example is V380 Cyg (IGuinan et al.ll2000l ). an active eccentric binary with total eclipses. 
Although the feedback from the network failed to reproduce the shape of the light curve (as 
there is no provision for such secular variations), the parameters are reasonably close to the 
published values. This observation indicates that the ANN could cope with such challenging 
cases even though the hght curve is severely affected by the secular effects. We plan to test 
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Fig. 17. — Left: The correlation between the sum of fractional radii and eccentricity for the 
OGLE LMC sample of 2681 stars. Diamonds denote solutions with smaller than 1.3, and 
pluses denote solutions with larger than 1.3. The values of are computed by feeding the 
ANN-derived parameters to PHOEBE, computing the corresponding synthetic curve, scaling 
it vertically to match the input curve and computing the residuals. The solid line represents 
the exponential function fit to the 3cr-lo" boundary, where all points were taken into account. 
The unusually large number of poor fits above the boundary is due to close-to-contact and 
overcontact EBs in the OGLE sample which the network was not trained to recognize. The 
increased number of poor fits towards low pi + p2 is due to training set under-sampling 
and phase resolution deficiency, as discussed in Section [XT] — Right: |ecosa;| versus relative 
radius for all OGLE LMC binaries w ith smaller than 1.3. A solid line represents the 
solution from iNorth and ZahnI (1200411 der i ved fr om a sub-sample of B stars; a dashed line 
represents the solution from iMazeh et al.l (l2006l ). The critical radius where orbits rapidly 
become circular is pc ~ 0.26, in perfect agreement with the previous studies. 
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such EBs further and to assess the sensitivity of the ANN on the amphtude and periods of 
intrinsic variations. 

The output of ANN is not a final solution; rather, it provides initial parameter estimates 
for dedicated EB modeling engines. These engines refine the result, estimate formal and 
heuristic errors of the fit, and produce statistics for the sample. A preview of this approach 
has been presented for CALEB data in Section 13.31 

But what happens when the EB-trained ANN is input some non-EB periodic light 
curve (a Cepheid, RR Lyr, etc.)? By itself, the ANN tier has no way of detecting this, it 
will produce a set of parameters for such a curve as well. However, the very next step in 
the IDP is to pass these parameters to PHOEBE and to compute the value of the solution. 
This value is an immediate indicator of the goodness-of-fit and can be used for rejection, 
re-training of the classifier, and subsequent re-classification. This procedure is expected to 
be quite effective, since the WD model acts as a matched filter for EBs, and this property is 
inherited by the EB-trained ANN. 

4. Conclusion 

ANNs have proved successful in problems of classification, real-time control, data min- 
ing and many other tasks in a variety of scientific and technical applications. This paper 
demonstrates their utility in parameter estimation for detached binaries, even in the face of 
parameter degeneracy, the correlation of certain pairs of parameters; e.g., the compensating 
effect of increasing a system's sum of radii vs. increasing its inclination. 

A task-optimal ANN topology and learning rate was determined and the network was 
trained on the rigorous WD model. The trained network was tested on a previously unseen 
test set of 10,000 WD model light curves, yielding model parameters to better than 90% 
accuracy for 90% of the test set. In tests with 50 real-world detached binary systems from 
the CALEB database, the ANN alone achieved sufficiently good fits for 22 systems that 
differential corrections were not needed. Those requiring differential corrections all converged 
to produce good quality fits, confirming the utility of the ANN-produced starting parameters. 
Overall, the CALEB test produced a 100% success rate for the ANN. 

We have shown that a suitable ANN can be successfully trained on a sophisticated EB 
model, and that the trained network produces quite satisfactory approximate light curve 
solutions with high computational efficiency. In addition to these advantages, ANNs have 
favorable properties of interpolation; they are well-behaved in regions around their multi- 
dimensional training points. Thus the network will interpolate reasonable starting parame- 
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ters for light curves with asymmetry or spots, and in fact for any (detached) hght curve that 
it has not seen before. It is quite remarkable that the network is able to memorize to such 
high quality all the light curve solutions in the space of detached binaries with only 8240 
numbers (201 x 40 + 40 x 5 weights). Clearly, this network has sufficient information stor- 
age capacity, but it is worth noting that the information c apacity corresponding to a give n 



network topology is only known in heuristic approximation (INeelakanta and De Groflal 19941 ). 
This confirms that much is yet to be learned about ANNs, and in particular it explains the 
necessity to explore network topologies and learning rates for their appropriateness to the 
given problem. Such exploration, in addition to the ability to rapidly train many iterations, 
further emphasizes the need for speed in the learning phase. 

This work was greatly facilitated by two developments. The polyfit algorithm produced 
excellent analytic approximations to real-world light curves sampled at (of ten quite) unequal 



phase intervals, where splines or other interpolating methods behave badly (lEmery and Thomson 



200ll ). making equal-phase interpolated data points available for input to the ANN. This al- 
gorithm has been found to be an excellent interpolator for EBs of all classes, as well as, for 
example, Cepheid variables. Since well-behaved interpolation is a problem common to many 
fields, polyfit could well see further application beyond astronomy. The scheme for paral- 
lelization of ANN training over multiple processors was key to quickly exploring topologies 
and training parameters, and churning through a half-million training iterations in days in- 
stead of weeks. This scheme had the beneficial side effect of lending simulated annealing 
behavior to the training phase, improving the search for the global minimum. 

The source code for polyfit and ANN is released under the GNU General Public License, 
which grants users the right to freely use, distribute and modify the code. The programs 



may be downloaded from the IDP project homepage: http : //www . eclipsingbinaries . org 



or PHOEBE homepage: http://phoebe.fiz.uni-lj.si 



This paper focuses on detached EBs. Systems with large proximity effects which are 
fully treated in the WD model will be discussed in a subsequent paper. We will thus be able 
to apply ANNs in a unified, automated approach to provide starting solutions for eclipsing 
binary light curves of all classes. 
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A. Polyfit algorithm - details 

In Section [23] we introduced a light curve fitting algorithm polyfit: a polynomial chain 
fitter. In this appendix we provide a formal description and comment on the implementation 
details. The polyfit algorithm fits a polynomial chain V{x) of piecewise smooth n-th order 
polynomials connected at the given set of knots. The two key ideas of the algorithm are to 
abandon the requirement of differentiability of V{x) at knots and thus allow the polynomial 
chain to be broken, and to randomly perturb the position of the knots and relax the system 
to the nearest minimum. 

1. Given the set of knots Xk] k = 1 . . . N, Xk+i > Xk V/c, partition the phase range into 
intervals: 

Il = [Xi,X2), I2 = [X2,X3), In=[xn,Xi). 

2. Given the first interval Ji = [xi,X2), do an n-th order polynomial least squares fit 
through data points on that interval: 

Pi{x) = a\{x — xi)" + al^_^{x — xi)"'~^ + . . . + a\{x — xi) + Og, x e [xi, X2) = h- 

3. Given the set of intervals 7^, k = 2 . . . (A^ — 1), do the n-th order polynomial least 
squares fit through the data enclosed between two adjacent knots: 

Pk{x) = a'^{x - XkT + a\-i{x - XkT~^ + • • • + a\{x - Xk) + cio' ^ ^ i^k, Xk+i) = h 

while satisfying the constraint that the polynomial pk{x) must be connected with 
Pk-i{x) at the knot Xk'- 

Pk{Xk) = Pk-l{Xk) ■ flo = (^n'^i^k - Xk-lT + (^n-li^k " Xk-lT~'^ + ■■■ + a^~^ . 

The last equation is a recursive formula to compute from the coefficients of Pk-i{x). 

4. Given the last interval In = [xn,xi), where xi is aliased (xi 1— >• xi + 1.0), do the n-th 
order polynomial least squares fit through data points on that interval: 

PNix) = a^(x - xat)" + a.|^_i(x - xat)""-^ + . . . + af (x - xat) + a^, x G [xat, Xi) = In 

while satisfying two constraints, connectivity: 

Pn{xn) = PN~iixN) : aQ = a^~^{xN - a^jv-i)" + a^Ii{xN - Xn~iT~^ + ... + a^~^ 
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and phase space wrapping: 

Pn{xi) = pi{xi) : al = {xi - xnY' + - xnT~^ + . . . + af (xi - xn) + a^- 

Since has aheady been determined in step 2, and has been determined by the 
connectivity constraint, this constraint is solved for and substitute the resulting 
expression into Pn{x)' 

n 

Pn{x) = + ^ af (x - Xn) [{x - x^f'^ - {xi - xnY'^] ■ 

i=2 

Least squares fit provides n + 1 coefficients a° for the first interval, n coefficients 
for intervals 2-{N — 1), and n — 1 coefficients af for the last interval, totaling nN 
coefficients of the wrapped and connected polynomial chain V{x), x e [xi,xn+i = Xi), 
of piecewise smooth polynomial functions. 

5. Given the polynomial chain V{x), evaluate the sum of squares of residuals: 

j 

for all observed data points {xj,yj) with respective weights Wj. 

6. Once the polynomial chain V{x) is set and its goodness-of-fit value computed, the 
initial set of knots is varied. Given a step amplitude 6, displace each knot by: 

Xk Xk+Af{xk,S), 

where J\f{x, a) is the Gaussian probability distribution function. Repeat steps 2-5 to 
compute xi- If xi < Xi: adopt the new set of knots, else reject it. 

7. Repeat steps 2-6 for the given the number of iterations L. The final Xl ^^^^ correspond 
to the set of knots Xk that best matches the data. 

There are some subtleties of the algorithm that had to be incorporated for our particular 
application: 

a) there has to be an explicit check that the interval Ii contains at least n+1 data points, 
intervals Ik, k — 2 . . .{N — 1), contain at least n data points, and interval /jv contains 
at least n — 1 data points. 
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b) there has to be a shght repulsion between knots so that the algorithm does not push the 
knots together. This is most easily achieved by adding a penalty to the cost function 
of the form e{xk — Xk-i)'"^ for the given penalizing coefficient e. 

c) the last knot is the same as the first knot: for knots there are N intervals and not 

— 1 as one may initially expect. The knot on the phase range boundary should thus 
wrap smoothly over that boundary. 

For detached EB light curves we found that 4 knots of a quadratic polynomial chain set 
initially at {-0.4,-0.1,0.1,0.4} suffice for a 95% success rate in fitting 2681 OGLE EBs. 
Actually, 4 knots of a quadratic polynomial chain are likely to suffice for most photometric 
light curves and can readily be used for classification purposes. The starting set of knots 
was chosen in the following manner: 

1. the data are sorted in phase; 

2. an average fiux / of all data points is computed; 

3. eacli^l data point i/i that is smaller than / is taken as start of a chain. The chain is 
built by all data points yk, k > i, that are smaller than /; 

4. if the chain is shorter than a prescribed threshold (we used 10 elements), it is discarded; 
else it is stored; 

5. in case of two or more chains, the two longest chains are picked; the first and last 
elements of each chain are taken as initial knots. If there are less than 2 chains, fall 
back on default knots: {-0.4, -0.1, 0.1, 0.4}. 

Fig. [19] shows the difficult example of choosing a set of initial knots by the above 
algorithm. The required number of iterations was set to L = 5000. These choices have 
proved to be particularly robust even in the tough cases of well-detached eccentric binaries 
(cf. Fig. [T8l) . The algorithm fails in less than 1% of the tested sample. The time cost of 
fitting 2681 light curves with ~500 data points each is around 2 minutes on a single 2GHz 
processor. 



■^The first chain starts only after a data point that exceeds the average value / has been encountered; 
that is done so that the last chain can be wrapped at the phase end. 
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Initial set of knots Final set of knots 




Phase Phase 



Fig. 18. — Performance of the quadratic 4-knot polyfit algorithm on an eccentric binary 
light curve. Left: the solution of a polynomial chain fit with the default set of initial knots, 
{—0.4, —0.1, 0.1, 0.4}. Right: the solution of the polynomial chain fit after 5000 iterations. 
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Fig. 19. — An example of the automatic knot seeking algorithm performance. This light 
curve corresponds to a well detached, eccentric binary that proved to be difficult to fit 
because of the narrow minima and sparse data points with substantial noise covering the 
primary minimum. Still, polyfit performed reasonably well. 
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B. Network fits to CALEB light curves 

In Section 13.31 we submitted 50 real- world EBs from the CALEB database to the ANN. 
Since published model parameters did not always match the data exactly, direct comparison 
of the results yielded by the network suffers from systematics. As explained in the main 
part of the paper, parameters obtained from the ANN were used to create a synthetic LC 
that was normalized to match the data. Based on the computed residuals we submitted 
the ANN solution to 1, 2, or 3 DC iterations: for values of 1.5 or less, no DC iteration 
was performed and the ANN solution is adopted as final. For values between 1.5 and 
2.0, 1 DC iteration was performed; for values between 2.0 and 2.5, 2 DC iterations were 
performed; finally, for values of 2.5 and higher, 3 DC iterations are performed. Table 
13.31 lists the systems where DC iterations were necessary; here in the Appendix we overplot 
solution LCs to the data to demonstrate the success of the method. 
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Fig. 20. — EBAI model light curves derived for the CALEB data. 
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Fig. 21. — Continued: EBAI model light curves derived for the CALEB data. 
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Fig. 22. — Continued: EBAI model light curves derived for the CALEB data. 
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Fig. 23. — Continued: EBAI model light curves derived for the CALEB data. 
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Fig. 24. — Continued: EBAI model light curves derived for the CALEB data. 
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